[GPU] Optimize merge memory usage #136411

ldematte · 2025-10-10T15:14:14Z

This PR changes how we gather and compact vector data for transmitting them to the GPU. Instead of using a temporary file to write out the compacted arrays, we use directly the vector values from the scorer supplier, which are backed by a memory mapped input. This way we avoid an additional copy of the data.

elasticsearchmachine · 2025-10-10T15:14:39Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

libs/simdvec/src/main/java/org/elasticsearch/simdvec/QuantizedByteVectorValuesAccess.java

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/SystemJvmOptions.java

...rc/main/java/org/elasticsearch/index/codec/vectors/reflect/VectorsFormatReflectionUtils.java

mayya-sharipova

@ldematte Great work, I have not tested it yet, but amazing work how you organized it. My main comment: do you think we can simplify this PR by breaking into two separate ones: making this PR only about changes to merges, and doing changes for flush, ResourcesHolder, 128Mb in a separate PR? Or these changes are tightly coupled?

...rc/main/java/org/elasticsearch/index/codec/vectors/reflect/VectorsFormatReflectionUtils.java

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java

ldematte · 2025-10-13T06:20:59Z

doing changes for flush, ResourcesHolder, 128Mb in a separate PR?

I can do that: here is the PR #136464

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java

…space

mayya-sharipova · 2025-10-14T21:13:13Z

@ldematte Great changes. I have done some benchmarking on my laptop with int8, and I see great recall but surprisingly no speedups as compared with main branch:

gist: 1_000_000 docs; 960 dims; euclidean metric

index_type	index_time (ms)	force_merge_time (ms)	QPS	single segment recall
gpu main	61422	69010	353	0.97
gpu PR	59035	67766	296	0.98

cohere-wikipedia_v2: 934_024 docs; 768 dims; cosine metric

index_type	index_time (ms)	force_merge_time (ms)	QPS	single segment recall
gpu main	48164	47657	384	0.99
gpu PR	47824	47354	393	0.99

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java

mayya-sharipova

Great work, @ldematte

ldematte · 2025-10-15T06:54:25Z

@mayya-sharipova I also expected speed-ups on force merge; it seems to be a bit better, but it's some "%", not "x".
I think this could be better in a "real" scenario (maybe even rally), where the disk is contended (search ops, translog, etc. -- we do have exclusive use of the drive here).
I simulated this by adding a background copy operation to keep the disk somehow busy, and you see it's more relevant. Still "%", not "x", but at least you can tell it's there and it's not noise.

…space

ldematte · 2025-10-16T07:00:28Z

@mayya-sharipova I updated merge as agreed, to avoid using directly device memory due to the cuVS bug.
I'll wait for your re-review; you can just look at the latest commit. Thanks!

mayya-sharipova

@ldematte Thanks, the latest changes to copy to a separate memory segment LGTM

ldematte added 2 commits October 10, 2025 14:28

Use the internal raw vector data during merge, avoid additional tmp file

c56c707

Fix access

0819dbd

ldematte requested a review from a team as a code owner October 10, 2025 15:14

ldematte added >non-issue auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search test-gpu Run tests using a GPU v9.2.1 v9.3.0 labels Oct 10, 2025

ldematte requested review from ChrisHegarty and mayya-sharipova October 10, 2025 15:14

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 10, 2025

ldematte commented Oct 10, 2025

View reviewed changes

libs/simdvec/src/main/java/org/elasticsearch/simdvec/QuantizedByteVectorValuesAccess.java Show resolved Hide resolved

ldematte mentioned this pull request Oct 10, 2025

Expose vector values from Int7SQVectorScorerSupplier #136416

Merged

Expose vector values from Int7SQVectorScorerSupplier

48a3f7c

ldematte changed the title ~~[Gpu] Optimize merge memory usage~~ [GPU] Optimize merge memory usage Oct 10, 2025

Merge branch 'main' into gpu/optimize-merge-space

7cede4c